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Abstract. Floyd languages (FL), alias Operator Precedence Languages, have re- 
cently received renewed attention thanks to their closure properties and local 
parsability which allow one to apply automatic verification techniques (e.g. model 
checking) and parallel and incremental parsing. They properly include various 
other classes, noticeably Visual Pushdown languages. In this paper we provide a 
characterization of FL in terms a monadic second order logic (MSO), in the same 
style as Biichi's one for regular languages. We prove the equivalence between 
automata recognizing FL and the MSO formalization. 

Keywords: Operator precedence languages, Deterministic Context-Free lan- 
guages, Monadic Second-Order Logic, Pushdown automata. 



1 Introduction 

Floyd languages (FL), as we recently renamed Operator Precedence Languages after 
their inventor, were originally introduced to support deterministic parsing of program- 
ming and other artificial languages: by taking inspiration from the structure of arith- 
metic expressions, which gives precedence to multiplicative operations w.r.t. additive 
ones, Robert Floyd defined an operator precedence matrix (OPM) associated with a 
context-free (operator) grammar. When the OPM is free of conflicts it is easy to build a 
deterministic shift-reduce algorithm that associates any language sentence with a unique 
syntax tree Q]. FL and related grammars (FG) were also studied with different moti- 
vations, such as grammar inference. This lead to discover interesting closure properties 
that are not enjoyed by more general context-free (CF) languages [2 1. After these initial 
results the interest in FL properties decayed for several decades, probably due to the 
advent of more expressive grammars, such as LR ones [3 1 which also allow for efficient 
deterministic parsing. 

Recently, however, we revitalized our interest in FL on the basis of two rather un- 
expected remarks. First, and rather occasionally, we noted that a newer class of CF 
deterministic languages, namely Visual Pushdown Languages (VPL) -and other con- 
nected families B4I5I61 - are a proper subclass of FL. VPL have been introduced and 
investigated Q with the main motivation to extend to them the same or similar auto- 
matic analysis techniques -noticeably, model checking- that have been so successful for 
regular languages; their major features which made them quite successful in the litera- 
ture are that: despite being recognized by infinite state machines -a specialized class of 
pushdown automata- they enjoy practically all closure properties exhibited by regular 
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languages; they can be defined by a suitable logic formalism that extends in a fairly nat- 
ural way the classical Monadic Second Order (MSO) logic characterization introduced 
by Biichi for finite state automata [8 1. These features, paired with the decidability of the 
emptiness problem shared by all CF languages, makes them amenable for the applica- 
tion of typical model checking techniques. When we realized that VPL are subclass of 
FL characterized by a well-precise "shape" of OPM we also investigated other closure 
properties that were not yet known: by joining old results of decades ago [9 1 with new 
ones [2 1, it turns out the FL enjoy the same closure properties w.r.t. main operations 
such as Boolean ones, concatenation, Kleene *, etc. as regular languages and VPL. 
Thus, FL too are amenable for a significant extension of model checking techniques. 

A second major motivation that renewed our interest in FL -which, however, has 
a lesser impact on the present research- is their locality principle, which makes them 
much better suited than other deterministic CF languages for parallel and incremental 
(parsing) techniques: unlike more general languages, in fact, the parsing of a substring 
w of a string x can be carried over independently of the "context" of w within x; we feel 
that in the era of multicore machines the minor loss in expressive power of FG w.r.t. say, 
LR ones, is far compensated by the gain of efficiency in -possibly incremental analysis- 
that can be obtained by exploiting parallelism IfTOl . 

In our path of "rediscovering FL and their properties", we also filled up a fairly sur- 
prising hole in previous literature, namely the lack of an automata family that perfectly 
matches FG in terms of generative power: Floyd Automata (FA) are reported in [11] 
and, with more details and precision, in IfTZl . 

In this paper we provide the "last tile of the puzzle", i.e., a complete characterization 
of FL in terms of a suitable MSO, so that, as well as with regular languages and VPL, 
one can, for instance, state a language property by means of a MSO formula; then 
automatically verify whether a given FA accepts a language that enjoys that property. 
Our new MSO logic is certainly inspired by the original [8 1 approach, as well as the 
technique to automatically derive a FA from a given formula; as it happened also with 
other previous "extensions" of properties and techniques to the FL family, however, we 
had to face some new technical difficulties which sharply departed from the original 
approaches of both regular and VPL J8], IfPJI . In this case the main difference between 
finite state automata and VPA on one side and FA on the other one is that the former 
ones are real-time machines -i.e. read an input character at any move, whereas FA are 
not; thus, properties expressed in terms of character positions cannot exploit the fact that 
to any position it corresponds one and only one state of the automaton. In some sense 
the logic formalization of a FL must encode the corresponding parsing algorithm which 
is far from the trivial one of regular and VPL whose strings have a shape isomorphic to 
the corresponding syntax tree. 

The paper is structured as follows: Section |2] provides the necessary background 
about FL and their automata. Section [3] defines a MSO over strings and provides two 
symmetric constructions to derive an equivalent FA from a MSO formula and con- 
versely. Section|4]offers some conclusion and hints for future work. 
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2 Preliminaries 



FL are normally defined through their generating grammars 111 1141 : in this paper, how- 
ever, we characterize them through their accepting automata 11 121 11 1 which are the nat- 
ural way to state equivalence properties with logic characterization. Nevertheless we 
assume some familiarity with classical language theory concepts such as context-free 
grammar, parsing, shift-reduce algorithm, syntax tree []3] . 

Let E = {a\, . . . , a n ] be an alphabet. The empty string is denoted e. We use a special 
symbol # not in E to mark the beginning and the end of any string. This is consistent 
with the typical operator parsing technique that requires the look-back and look-ahead 
of one character to determine the next parsing action Q- 

Definition 1. An operator precedence matrix ( OPM) M over an alphabet E is a partial 
function (E U {#}) 2 — > {<, — , >}, that with each ordered pair (a,b) associates the OP 
relation M a< b holding between a and b. We call the pair (E, M) an operator precedence 
alphabet (OP). Relations <,=,>, are named yields precedence, equal in precedence, 
takes precedence, respectively. By convention, the initial # can only yield precedence, 
and other symbols can only take precedence on the ending #. 

If M a ^ = °, where o e {<, =, >}, we write aob. For u, v e E* we write u o v if u = xa 
and v - by with aob. M is complete if Maj, is defined for every a and b in E. Moreover 
in the following we assume that M is acyclic, which means that c\ = Ci — . . . — = c\ 
does not hold for any C\,cz, . ..c* e E, k > 1. See I9I2I12I for a discussion on this 
hypothesis. 

Definition 2. A nondeterministic Floyd automaton (FA) is a tuple J{ = (E, M, Q, I, F, 6) 

where: 

— (E, M) is a precedence alphabet, 

— Q is a set of states (disjoint from E), 

— I,F C Q are sets of initial and final states, respectively, 

— 5 : Q X (E U Q) — > 2^ is the transition function. 

The transition function is the union of two disjoint functions: 

:QxE^2 Q <W, : Q x Q -» 2 Q 

A nondeterministic FA can be represented by a graph with Q as the set of vertices and 
E U Q as the set of edge labelings: there is an edge from state q to state p labelled by 
a G E if and only if p € 6 pus h{q, a) and there is an edge from state q to state p labelled by 
r G Q if and only if p e 6fi us h(q, r). To distinguish flush transitions from push transitions 
we denote the former ones by a double arrow. 

To define the semantics of the automaton, we introduce some notations. We use 
letters p, q, p„ q„ . . . for states in Q and we set E' — \a' \ a e E}; symbols in E' are 
called marked symbols. Let r — (E U E' U {#}) x Q; we denote symbols in F as [a q], 
[a'q], or [# q], respectively. We set smb([a q]) = smb([a'q]) = a, smb([# q]) = #, and 
st([a q]) - st([a'q]) = st([# q]) = q. 
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A configuration of a FA is any pair C - {B\Bi . . . B n , a\a-i . . . a m ), where B,- € r 
and a, € E U {#}. The first component represents the contents of the stack, while the 
second component is the part of input still to be read. 

A computation is a finite sequence of moves C h C\ ; there are threeg kinds of 
moves, depending on the precedence relation between smb(B„) and a\\ 

(push) if smb(B n ) = a\ then C\ = <Z?i . . . B n [a\ q], a-i . . . a m ), with q e 6 pus h(st(B„), a{)\ 

(mark) if smb{B„)<a\ then C\ - {B\ , . .B n [a\'q], a 2 ■ • ■ a m ), with q e 6 pus h(st(B„), a\); 
(flush) if smb(B n ) > a\ then let i the greatest index such that smb(Bj) e E' . 

Ci = (Bi . . . Bj-2[smb{Bj-i) q], a { a 2 . ■ . a m ), with q € 6 f i ush (st(B„), st(B^i)). 

Finally, we say that a configuration [# qi] is starting if qi e I and a configuration 
[# qr ] is accepting if qr e F. The language accepted by the automaton is defined as: 

L(J[) = \x I <[# qi ], x#) h <[# q F ], #>, 9/ e /, 9f € . 

Notice that transition function 5 pus h is used to perform both push and mark moves. 
To distinguish them, in the graphical representation of a FA we will use a solid arrow 
to denote mark moves in the state diagram. 

The deterministic version of FA is defined along the usual lines. 

Definition 3. AM is deterministic if I is a singleton, and the ranges of 6 pus h an d Sflush 
are both Q rather than 2@. 

In Ifl2ll we proved in a constructive way that nondeterministic FA have the same 
expressive power as the deterministic ones and both are equivalent to the original Floyd 
grammars. 

Example 1. We define here the stack management of a simple programming language 
that is able to handle nested exceptions. For simplicity, there are only two procedures, 
called a and b. Calls and returns are denoted by call a , callb, ret a , retb, respectively. 
During execution, it is possible to install an exception handler hnd. The last signal that 
we use is rst, that is issued when an exception occur, or after a correct execution to 
uninstall the handler. With a rst the stack is "flushed", restoring the state right before 
the last hnd. Every hnd not installed during the execution of a procedure is managed by 
the OS. We require also that procedures are called in an environment controlled by the 
OS, hence calls must always be performed between a hnd/rst pair (in other words, we 
do not accept top-level calls). The automaton modeling the above behavior is presented 
in Figure Q] 

Incidentally, notice that such a language is not a VPL but somewhat extends their ra- 
tionale: in fact, whereas VPL allow for unmatched parentheses only at the beginning of 
a sentence (for returns) or at the end (for calls), in this language we can have unmatched 
call a , callb, ret a , retb within a pair hnd, rst. 
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Fig. 1. Precedence matrix, automaton, example run, and corresponding tree of Exam- 
ple m 
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Definition 4. A simple chain is a string cqC\C2 ■ ■ ■ C[Cc+\, written as c °[c\C2 - ■ . cc] CM , 
such that: cq, C( + \ £ E U {#), c,- £ E for every i — 1,2, . . .(, and cq < c\ = ci . . . Cf_i = 
C[ > C(+\. A composed chain is a string cqsqc\S\C2 ■ ■ .C(S(Ce+\, where € °[c\C2 ■ ■ -C{] CM 
is a simple chain, and s,- e E* is the empty string or is such that e '[s,] e ' +1 is a chain 
( simple or composed), for every i = 0,1, Such a composed chain will be written 

as Co [sqCiSic 2 ■ ..c t s c ] CM . 

A string s e E* is compatible with the OPM M if # [s] # is a chain. 

Definition 5. Let UK be a Floyd automaton. We call a support for the simple chain 
Co [c\C2 ■ ■ ■ C(] CM any path in 3\ of the form 

qo — > qi — > • ■ • — > qt-\ — > qe=> qc+i (l) 

Notice that the label of the last (and only) flush is exactly qo, i.e. the first state of the 
path; this flush is executed because of relation C{ > Ce+\. 

We call a support for the composed chain Co [ioci5iC2 . . . C(S(] Cc+i any path in of 
the form 

qo ~» <?o — *9i^*9i — > • ■ • — * qc^> q c =* it+\ (2) 

where, for every i = 0,1,...,^: 

— if Si + e, then q t ~» q'- t is a support for the chain e '[s,-] c ' +1 , i.e., it can be decomposed 

s ' „ * / 
as q- t ^> q\ => q r 

— if Sj — e, then q' t = q^ 

Notice that the label of the last flush is exactly q' . 

The chains fully determine the structure of the parsing of any automaton over 
(E, M). Indeed, if the automaton performs the computation 

{[a q ], sb) h ([a q], b). 

then a [s] is necessarily a chain over (E,M) and there exists a support like (fj) with 
s = sqCi . . . C(S( and qi + \ - q. 

Furthermore, the above computation corresponds to the parsing by the automaton of 
the string sqc\ . . . C(S( within the context a,b. Notice that such context contains all infor- 
mation needed to build the subtree whose frontier is that string. This is a distinguishing 
feature of FL, not shared by other deterministic languages: we call it the locality prin- 
ciple of Floyd languages. 

Example 2. With reference to the tree in Figure[T] the parsing of substring hnd call a rst hnd 
is given by computation 

([# qo] i hnd call a rst hnd) h ([#^0] > hnd) 

hnd call a q\ rst <?o 

which corresponds to support go — > q\ — > q\ => q\ — > q\ => qo of chain 
# [hnd call a rst] hnd . 
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Definition 6. Given the OP alphabet (E, M), let us consider the FA M(E, M) - (E, M, 
{q}, {q}, {q}, 6 max ) where 6 max {q, q) - q, and 5 max {q, c) — q,Vc e E. We call M(E, M) the 
Floyd Max- Automaton overE, M. 

For a max-automaton Jl{E, M) each chain has a support; since there is a chain # [s] # 
for any string s compatible with M, a string is accepted by 3l(E, M) iff it is compati- 
ble with M. Also, whenever M is complete, each string is compatible with M, hence 
accepted by the max-automaton. It is not difficult to verify that a max-automaton is 
equivalent to a max-grammar as defined in [9 1; thus, when M is complete both the max- 
automaton and the max-grammar define the universal language E* by assigning to any 
string the (unique) structure compatible with the OPM. 

In conclusion, given an OP alphabet, the OPM M assigns a structure to any string in 
E* compatible with M; a FA defined on the OP alphabet selects an appropriate subset 
within such a "universe". In some sense this property is yet another variation of the 
fundamental Chomsky-Shiitzenberger theorem. 

3 Logic characterization of FL 

We are now ready to provide a characterization of FL in terms of a suitable Monadic 
Second Order (MSO) logic in the same vein as originally proposed bu Biichi for regular 
languages and subsequently extended by Alur and Madhusudan for VPL. The essence 
of the approach consists in defining language properties in terms of relations between 
the positions of characters in the strings: first order variables are used to denote posi- 
tions whereas second order ones denote subsets of positions; then, suitable construc- 
tions build an automaton from a given formula and conversely, in such a way that for- 
mula and corresponding automaton define the same language. The extension designed 
by lfL3ll introduced a new basic binary predicate ~» in the syntax of the MSO logic, 
x ~» y representing the fact that in positions x and y two matching parentheses -named 
call and return, respectively in their terminology- are located. In the case of FL, how- 
ever, we have to face new problems. 

- Both finite state automata and VPA are real-time machines, i.e., they read one input 
character at every move; this is not the case with more general machines such as 
FA, which do not advance the input head when performing flush transitions, and 
may also apply many flush transitions before the next push or mark which are the 
transitions that consume input. As a consequence, whereas in the logic characteri- 
zation of regular and VP languages any first order variable can belong to only one 
second order variable representing an automaton state, in this case -when the au- 
tomaton performs a flush- the same position may correspond to different states and 
therefore belong to different second-order variables. 

- In VPL the relation is one-to-one, since any call matches with only one return, 
if any, and conversely. In FL, instead the same position y can be "paired" with 
different positions x in correspondence of many flush transitions with no push/mark 
in between, as it happens for instance when parsing a derivation such as A => or A, 
consisting of k immediate derivations A => aA; symmetrically the same position x 
can be paired with many positions y. 
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In essence our goal is to formalize in terms of MSO formulas a complete parsing al- 
gorithm for FL, a much more complex algorithm than it is needed for regular and VP 
languages. The first step to achieve our goal is to define a new relation between (first 
order variables denoting) the positions in a string. 

In some sense the new relation formalizes structural properties of FL strings in the 
same way as the VPL ~» relation does for VPL; the new relation, however, is more 
complex as its VPL counterpart in a parallel way as FL are much richer than VPL. 

Definition 7. Consider a string s e E* and a OPM M. For < x < y < \s\ + 1, we 

write x r\ y iff there exists a sub-string of#s# which is a chain a [r] b , such that a is in 
position x and b is in position y. 

Example 3. With reference to the string of Figure Q] we have 1 a. 3, a. 4, 6 rv 8, 
4 rv 8, and rv 9. Notice that, in the parsing of the string, such pairs correspond 
to contexts where a reduce operation is executed (they are listed according to their 
execution order). 

In general x r\y implies y > x + 1 , and a position x may be in such a relation with more 
than one position and vice versa. Moreover, if s is compatible with M, then rv |s| + 1. 

3.1 A Monadic Second-Order Logic over Operator Precedence Alphabets 

Let (E,M) be an OP alphabet. According to Definition |7]it induces the relation rv over 
positions of characters in any words in E*. Let us define a countable infinite set of 
first-order variables x,y, . . . and a countable infinite set of monadic second-order (set) 
variables X, Y, 

Definition 8. The MSOx.m (monadic second-order logic over (E, M)) is defined by the 
following syntax: 

<p :— a(x) \ xeX\x<y\xr\y\x— y+l \ -up \ if V tp \ 3x.ip \ 3X.ip 

where a e E, x,y are first-order variables and X is a set variable. 

MSOij.M formulae are interpreted over (E, M) strings and the positions of their char- 
acters in the following natural way: 

- first-order variables are interpreted over positions of the string; 

- second-order variables are interpreted over sets of positions; 

- a(x) is true iff the character in position x is a; 

- x rv y is true iff x and y satisfy Definition|7J 

- the other logical symbols have the usual meaning. 

A sentence is a formula without free variables. The language of all strings s e E* 
such that #i# \= (pis denoted by L(<p): 

U<p) = (*er #s# 1= <p] 

where |= is the standard satisfaction relation. 



Logic Characterization of Floyd Languages 



9 



Example 4. Consider the language of Example [TJ with the structure implied by its 
OPM. The following sentence defines it: 



Vz 



( call a {z) V ret a {zy 

V 

call b (z) V retbiz), 



3x,y 



xn*yAx<z<y 
A 

hnd(x + 1) A rst(y — 1) J 



Example 5. Consider again Examplefl] If we want to add the additional constraint that 
procedure b cannot directly install handlers (e.g. for security reasons), we may state it 
through the following formula: 

Vz (hnd(z) => -i3w (call\,(u) A (u+UzVurv z))) 

We are now ready for the main result. 

Theorem 1. A language L over (E, M) is a FL if and only if there exists a MSOe.m 
sentence if such that L — L((p). 



The proof is constructive and structured in the following two subsections. 



3.2 From MSO^m to Floyd automata 

Proposition 1. Let (E, M) be an operator precedence alphabet and ip be a MSOe,m 
sentence. Then L(tp) can be recognized by a Floyd automaton over (E, M). 

Proof. The proof follows the one by Thomas 1 8 1 and is composed of two steps: first the 
formula is rewritten so that no predicate symbols nor first order variables are used; then 
an equivalent FA is built inductively. 

Let E be {a\, Wx, . . . , a n }. For each predicate symbol a, we introduce a fresh set vari- 
able Xf, therefore formula aj(x) will be translated into x e Xj. Following the standard 
construction of [8|, we also translate every first order variable into a fresh second or- 
der variable with the additional constraint that the set it represents contain exactly one 
position. 

Let ip' be the formula obtained from <p by such a translation, and consider any subfor- 
mula t// of <p': let X\ , X2, . . . , X n , X n +t, . . . X n+m ^ be the (second order) variables appear- 
ing in ift. Recall that X\, . . . ,X„ represent symbols in E, hence they are never quantified. 

As usual we interpret formulae over strings; in this case we use the alphabet 

A(i/f) = [a e {0, l}" +m W I 3li s.t. 1 < i < n, a t = l} 

A string w e A(i//)*, with \w\ = t, is used to interpret if/ in the following way: the 
projection over j-th component of A{ip) gives an evaluation {1,2,. . .,£} — > {0, 1 } of Xj, 
for every 1 </'<« + mftfr). 

For any a e A(ip), the projection of a over the first n components encodes a symbol 
in E, denoted as symb(a). The matrix M over E can be naturally extended to the OPM 
M(i[i) over A(i//) by defining M(t//) afi = M symb{aXsymm for any a,/3e A(i//). 

We now build a FA J[ equivalent to ip' . The construction is inductive on the structure 
of the formula: first we define the FA for all atomic formulae. We give here only the 
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construction for rv, since for the other ones the construction is standard and is the same 
as in HI. 

Figure |2] represents the Floyd automaton for atomic formula if/ = X, rv Xj (notice 
that i, j > n). For the sake of brevity, we use notation [X,-] to represent the set of all 
tuples A(if/) having the i-th component equal to 1 ; notation [X] represents the set of all 
tuples in A{ift) having both i-th and j-th components equal to 0. The automaton, after a 
generic sequence of moves corresponding to visiting an irrelevant portion of the syntax 
tree, when reading Xj performs either a mark or a push move, depending on whether 
Xj is a leftmost leaf of the tree or not; then it visits the subsequent subtree ending with 
a flush labeled q\\ at this point, if it reads Xj, it accepts anything else will follow the 
examined fragment. 




Fig. 2. Floyd automaton for atomic formula if/ — X/ r\ Xj 



Then, a natural inductive path leads to the construction of the automaton associated 
with a generic MSO formula: the disjunction of two subformulae can be obtained by 
building the union automaton of the two corresponding automata; similarly for nega- 
tion. The existential quantification of X{ is obtained by projection erasing the i-th com- 
ponent. Notice that all matrices M(if/) are well defined for any if/ because the first n 
components of the alphabet are never erased by quantification. The alphabet of the au- 
tomaton equivalent to <p' is A(<p') = {0, 1}", which is in bijection with Z. 

3.3 From Floyd automata to MSO^, M 

Let Jibe a deterministic Floyd automaton over (£, M). We build a MSO^m sentence 
tp such that L(Jl) = L(ip). The main idea for encoding the behavior of the Floyd au- 
tomaton is based on assigning the states visited during its run to positions along the 
same lines stated by Biichi (SI and extended for VPL [13]. Unlike finite state automata 
and VPA, however, Floyd automata do not work on-line. Hence, it is not possible to 
assign a single state to every position. Let Q = {go, q\,..., q^} be the states of J\ with 
qo initial; as usual, we will use second order variables to encode them. We shall need 
three different sets of second order variables, namely Pq, P\, . . . , P N , Mo, M\ , . . . , M N 
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and Fq,F\, . . . ,F^: set P, contains those positions of s where state i may be assumed 
after a push transition. M, and represent the state reached after a flush: F , contains 
the positions where the flush occurs, whereas M, contains the positions preceding the 
corresponding mark. Notice that any position belongs to one only P,-, whereas it may 
belong to several Fj or M,- (see Figure|3]l. 




Fig. 3. Example trees with a position t belonging to more than one M, (left) and Fj 
(right). 



We show that J[ accepts a string s iff #s# |= ip, where 



<p :=3P ,Pi,...,P N ,M Q ,Mu...,M N ,Fo,F l ,...,F N ,e <p' 

ip' := 6 P A V/gf e € Fj A -n3x(e + 1 < x) A #(e + 1) A (p s A ^ AIS , A ^ M „ !?M .. 



The first clause in ip' encodes the initial state, whereas the second, third and fourth ones 
encode the final states. We use variable e to refer to the end of s, i.e., e equals the last 
position The remaining clauses are defined in the following: the fourth one encodes 
the transition function; the last ones together encode the fact that there exists exactly one 
state that may be assumed by a push transition in any position, and the correspondence 
between mark and flush transitions. 

For convenience we introduce in formulae precedence relations and other shortcut 
notations, presented next. 

Notation. In the following, when considering a chain a [s] h we assume s = sqc^ s-i . . . CfSt, 
with "[c\C2 ■ ■ ■ C(] b a simple chain (any s g may be empty). Also let x g be the position 
of symbol c g , for g = 1,2, ... ,£ and, for the sake of uniformity, set cq = a, xq = 0, 
Q+i = b, and x M = \s\ + 1. 
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xoy= \J a(x) A b(y), for o 6 {<, =, >} 



x rv y 
A 

Tree(x, z, w, y) := (x + 1 = z V x r\ z) A -i3f(x <(<zAirv() 

A 

y (w + 1 = y V w r> y) A -i3f(w <f<yAwrvy) 
Succii(x,y) := x + 1 =yAiePt 
Next^(x,y) :=xrvyAxeMfcA)'-l € F,t 



Flush^x, y) :=x a.j A x e Ay - I e Ft A 



3z, w 



Tree(x,z,w,y)A \/ Y 

i=0 7=0 



Tree,- j(x, z, w, y) : = Tree(x, z, w,y) A 



S(qi,qj) = qt 
A 

(Succ,(w, y) V Next,(w, y)) 
A 

(Succ/x, z) V Next/x, z)) 

' Succ,(w, y) V Flush,(w, y)) " 
A 

(Succ/x, z) V Flush/x, z)) , 



Remarks. If x rv y then there exist (unique) z and w such that Tree(x, z, w, y) holds. 
In particular, if a [s] b is a simple chain, then rv £ + 1 and Tree(0, l,{,€+l) holds; if 
a [s] h is a composed chain, then rv \s\ + 1 and Tree(0, xu X(, Xi+\) holds. If so = e then 
X\ = 1, and if se = e then xi = \s\. 

By definition, Tree,j(x, z, w, y) Aqu- £(<?,■, qj) implies Flush^x, y). 
If a [c\C2 ■ ■ ■ C{] h is a simple chain with support 



<7; = tf/ — > ft 



Cf *0 

— > ft, => 9* 



(4) 



then Tree, 0j/f (0, !,{,{+ 1) and Flush^O, 1) hold; if a [soCiSiC2 ■ ■ ■ C(S(] b is a composed 
chain with support 



.5 Ci S[ e 2 

4; = ft 9/o —> <lh ~» 9/i — > 



9', ^ «/, 



— > ft ~» ft, => ft (5) 



then by induction we can see that Tree/ fj / (0, js'ol+l, ko ■ • ■ Ql> 1*1+1) and Flush^(0, |s| + l) 
hold. 

Formula <p s is the conjunction of the following formulae, organized in forward for- 
mulae and backward formulae: 



Forward formulae. 

<P P ush.fw ■= Vx,y f\ 



;=o 



(x < y V x = y) A a(y) 
Succ,(x, >■) V Flush,-(x, y) 
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N N 

(Pflush-fw 

:= Vx,z,w,y /\ /\ 

i=0 ;'=0 



x e M S ( quqj) 
Tree,j(x, z, w, y) A 

y - 1 e F S(qhqj) t 



Backward formulae. 



ypush-bwl 



:=Vx,y/\ 



k=0 



' (x < y V x = y) A a(y) n 

A => \V (SucC;(x,y) A (5(g;, fl) = gt) 

ye ft A x + 1 = y i=o 



' (x < y V x = y) A a(y) 2V 

A => y (Flush,(x, y) A (5(g;, a) = ^) 

y € Pi< A x rv y ;=o 



<Pp US hJm2 ■- V*,y /\ 
k=0 

<PflushJmM '■ = 



;V 



Jt=0 



AT AT 



fflush-, 



/.=() 



i=0 7=0 

AT AT 



' Treey (x, z, w, y) Y 
A 



Awf '- =Sf y/\ yeF k =>3x,z,w\J\J 



JV » » 

<pfiushj>w ■= vx,z, tv,y yy yy yy 

fc=Q i=0 7=0 



i=0 7=0 

' Treey(jc, z, w, y) 



' Tree,j(x, z, w, y) 

A 

6{qi,qj) = qk ) 



S{qi,qj) = qk 



\ Flusht(x,y) 
Formula <f> eX ist is the conjunction of the following formulae: 



push-exist • — Vx 



\JxePi 



V flush-exist '■= Vx,y 



( N 



x y 



Y Flushi(x,y) 



U=0 



Formula ip u „i que is the conjunction of the following formulae: 

N ( N \ 



T ] pushjmique 



:= Vx y\ x e Pi => -. \J{j + i A x e P,) 

i=0 I 7=0 



<P flush., 



k=0 



7=0 

Af 



AT /" Af A 

:= Vx,y A Flushi(x,y) => -i + k /\ Flush/x^)) 

7=0 



Remark 1. If (f3j) holds, then for each x, y Succ,(x, y) V Flush,(x, y) implies that such i is 
unique. Indeed, Succ,(x, y) and Flush^(x, y) are mutually exclusive; if Flush,(x, y) then 
such ; is unique by (f fiush-unique', if Succ,(x,y) then y = x + 1 and x e f,-, thus such i is 
unique by tp push 

.unique- 
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Now let C = a [s] h be a chain in (27, M) and set 
3P ,Pu...,Pn 

tfr i>k := 3M ,M U ...,M N 3e (o e P t A Flush^O, e + 1) A <#$ A <^„, s , A <p mique ) . 
3Fo,Fi,. . . ,Fjv 

The following lemmata hold. 

s 

Lemma 1. If there exists a support q, ~» Ik for the chain C in J\, then asb \= if/^. 

Proof. We prove the lemma by induction on the structure of chains. 

Base step Let C be a simple chain and its support be decomposed as in (0). 

Define e - I, and P§,P\, . . . ,P$, Mq, . . . , M^, Fq, . . . , F^ as follows. M/, is empty 
except for M* = {0}; Ff, is empty except for Fk = {€}; for every x = . . .€, let f/, 
contain x iff f A - = h (i.e., jc e P, t ); finally let P§(q t ,b) contain I + 1 if a < b or a = 

Then we show that i/r^ is satisfied by checking every subformula in <p$, <p eX ist> f unique- 

L <P P ush.fw is satisfied Vx = y - 1 < I with y e P^,^ A a(y). Then S(q t( ,q to ) = q k 
guarantees Flush,t(0, i + 1); and diqt, b) = q, M guarantees t + 1 e Pf+\. 
Remark. Even if J{ is deterministic, some chains could have different supports. 
However, every support produces exactly one assignment P t0 , P h , . . . , Fk, M k that 
satisfies i// to ,k- 

2. f flush Jv/ is satisfied for x = 0,z = \,w = t,y = t + \ with P^P^F^Mt (for all 

other cases, it is -i Tree,-. ; (jic, z, w, y)). 
3- (fipushjnvi is satisfied in the natural way for every y < t\ for y = ( + 1, it is x > y, 

x + 1 = y, which implies -i(x < y V x = y) and the antecedent is false. 
4. <fipushj>w2, for every pair (x,y) ^ (0, 1 + 1) is satisfied with -a r> y; for x — 0, 

y = ^+ 1, if x>y the antecedent is false, otherwise it is satisfied with Flush^O, (+ 1), 

°<t+i ■ 

5- <Pfiushj>wM and <PflushJwF are satisfied with jc = and y = € + I, respectively. (For 
jc > 0, y < £ the antecedents are false.) 

6. VfiushJbw is satisfied in a vacuous way (false antecedent) for (x,y) + (0, 1 + 1). For 
jc = 0,y = £ + 1 it is satisfied with i = tf, j = to, F k . 

7. (fpush^xist, ¥ push-unique, <P flush-exist, and (p fiush.umque are always satisfied, because a) the 
chain has a support, b) J\ is deterministic. 

8. i/f to ,k is finally satisfied with Flush^O, £ + 1). 

Induction step 

Let now C be a composed chain and let its support be decomposed as in (0. Let us 
consider the case s$ + e + S( (other cases are similar and simpler, therefore omitted). 
Thus, 5(q fr q fo ) = q k . 

Let e be \s\. By the inductive hypothesis, for every g = 0, 1, . . . , I such that s s + e we 
have c g s g c g+l \= ^ tgJg : let P g , P N g , M 8 , M N g , F 8 , F N g be (the naturally 
shifted versions of) an assignment that satisfies ^» / . In particular this means x g e 
P tg U Mf , x g+ i — 1 £ Ff, and Flushf g (x g , x g +i). Then define Pi,, M/,, Ff, as follows. Let 
Pi, be the union of all P/, g , Mf, include all M/, g , F n include all F n g . Also let M k contain 
xq and Fk contain Xf. Finally let Ps(q k ,b) contain I + 1 if a < b or a = b. 
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Then we show that ^ is satisfied by checking every subformula in tfg, ip ex i St , if unique- 
By the inductive hypothesis, all axioms are satisfied within every s g . Thus, we only have 
to prove that they are satisfied in positions x g , for < g < (, The proof of satisfaction 
of most axioms in is clerical. Thus, we consider only a meaningful sample thereof. 

1- <Ppush.fw is satisfied for x = x g -\ andy = x g since Succ^ , (x g -t, x g )vFlush^ ^(xg-i, x g ) 

holds and d(q frl ,c g ) = q, g , x g e P tg . 
2. <Pfiusk.fw is satisfied for Tree/ fj / (0, 1, x c , x M ) since e M k , x t e F k; S(q fr ,q fo ) = 

qk- 

3- <PpushJbw2 is satisfied for x g e P, g and x g -\ r\ x g (if s g -\ + e), since Flush/, (x g -\ , x g ) 
and 5(q fg _ x ,c g ) = q tg . 

4. (fipush-bwM, <Pp US hJwF, fpushJm are satisfied for Tree /f ,/ (0, l,x e , x (+ \) by 6(q fn q fo ) = 
qk- 

5- f push-unique, and if fiush-unique are satisfied because J{ is deterministic. 
Hence asb \= fc^. □ 

Lemma 2. For every chain C, asb \= if/^ implies that there exists a support q a q k 
for C in 3K. 

Proof. Again, we prove the lemma by induction on the structure of chains. 

Base step First consider the induction bases with s g = e for every g = 0, 1, . . .,€, i.e., 
a [s] h is a simple chain with s = c\C2 ■ ■ -q. Let asb \= i/^. Hence there is a suitable 
assignment for e, P„, M n , F n such that e Pi A Flush^O, e + 1) A (fg A (f eX ist A if unique 
holds true. Clearly e is \s\. For every g, let f ? be the index such that g e P, . Notice that 
t g is unique by if pus h.umque and in particular fo = i. Hence t g is the unique index such that 
Succ, s (g, g+l). Then, by <fi puS hJwi with y - g < I, we have <J(^ , c g+ i) = q tg+1 . Moreover, 
since Flush^O, {+ 1) A Tree, f ,, (0, 1), by iffi lls h.bw we get (%, f , g, ) = q k . Hence 
we have built a support like (|4). 

Induction step Now consider the general case with s = social . . . cgsc and again con- 
sider the assignment for P/,,M/,,F/, that satisfies tff^. For every g, let t g be the index 
such that x g e P t , and notice that t g is unique by (f pus hjmique', m particular fo = i. 
For g - 0,1,. ..,{, since x g x g +\ V x g +\ = x g + 1, let / g be the index such that 
Flush/ (x g , x g+ \) V Succy (x g , x g+ i). Notice that such f g is unique by if unique (see Re- 
mark[T]i, moreover s g = e implies f g = t g . Hence if s g + e, we have c g s g c g+ \ \= iff t g ,f s 

and, by the inductive hypothesis, there exists a support q tg ~» qf g in 

For every g — < I, since / g is unique, by applying (f pus hjbwi with y = x g+ \ 
we get £(#/ , c g+ i) = q t . Moreover, since Tree; if( (xo,*i,x^*m) AMushfc(xo,Xf+i), by 
ffiushj>w we get S(qi,,qi) = qt- Hence we have built a support like (0 and this concludes 
the proof. □ 

Proposition 2. Let (Z, M) be an operator precedence alphabet and Jibe a Floyd au- 
tomaton over (27, M). Then there exists an MSOe.m sentence if such that L{J[) — L(tp). 



Proof. Let if be the MSO^ m sentence defined in ([3]). We show that L(${) = L{ip) by ap- 
plying the previous lemmata. Consider an accepting computation of s in J\. Then there 
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exists a support qo ~> qu for the chain # [s] # , with q^ a final state; hence by LemmaQ] 
#s# |= iffo,k- Vice versa, let s e L(jp), then #s# |= ifo,* with g'/t a final state; hence 
Lemma[2]implies that there exists a path qo ~» ^ ; and this concludes the proof. □ 

4 Conclusions and future work 

This paper somewhat completes a research path that began more than four decades 
ago and was resumed only recently with new -and old- goals. FL enjoy most of the 
nice properties that made regular languages highly appreciated and applied to achieve 
decidability and, therefore, automatic analysis techniques. In this paper we added to 
the above collection the ability to formalize and analyze FL by means of suitable MSO 
logic formulae. 

New research topics, however, stimulate further investigation. Here we briefly men- 
tion only two mutually related ones. On the one hand, FA devoted to analyze strings 
should be extended in the usual way into suitable transducers. They could be applied, 
e.g. to translate typical mark-up languages such as XML, HTML, Latex, . . . into their 
end-user view. Such languages, which motivated also the definition of VPL, could 
be classified as "explicit parenthesis languages" (EPL), i.e. languages whose syntac- 
tic structure is explicitly apparent in the input string. On the other hand, we plan to start 
from the remark that VPL are characterized by a well precise shape of the OPM Q to 
characterize more general classes of such EPL: for instance the language of Example 1 
is such a language that is not a VPL, however. Another notable feature of FL, in fact, is 
that they are suitable as well to parse languages with implicit syntax structure such as 
most programming languages as to analyze and translate EPL. 
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